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Abstract 

Computer performance modeling of possibly complex computations running on 
highly concurrent systems is considered. Earlier works in this area either dealt with 
a very simple program structure or resulted in methods with exponential complex- 
ity. An efficient procedure is developed to compute the performance measures for 
series-parallel-reducible task systems using queueing network models. The procedure 
is based on the concept of hierarchical decomposition and a new operational approach. 
Numerical results for three test cases are presented and compared to those of simula- 
tions. 
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1 Introduction 

As interest in highly concurrent system research expands, a means for predicting the 
probable performance of application computation on such systems would be an im- 
portant tool for evaluating the effectiveness of how these systems are being utilized. 
Unfortunately, most existing analytical approaches either are not suitable to study 
parallel systems, or are computationally intractable. Simulation approaches are also 
unrealistic if the systems being studied are much more capable than the systems sup- 
porting the simulation. 

This report describes a computationally efficient procedure for predicting the prob- 
able performance of a possibly complex computation on a highly concurrent system 
using queueing network models. The computation is specified by a series-parallel- 
reducible task system which consists of a set of tasks related by a deterministic prece- 
dence graph. Each task is characterized by its expected total loadings (or service 
demand) on the resources of the computer system. The procedure presented here can 
be used to determine key performance measures such as mean execution time for each 
task, mean completion time for the task system, and utilizations for the resources in 
the computer system. 

Heidelberger and Trivedi [HT82,HT83] have used analytic queueing models to pre- 
dict performance for programs with internal concurrency. In [HT82], they considered 
systems in which a parent task subdivides into two or more tasks which require no 
synchronization with the parent task. In [HT83], a task can spawn two or more con- 
current tasks but has to wait for their completions before it can proceed. Both papers 
considered only very simple task systems. The task systems considered in this report 
are much more complicated and include all concurrent task systems programmed using 
block-oriented constructs like cobegin , coend , DOALL, fork , join , etc.. 

Thomasian and Bay [TB83] considered a more general task system with determin- 
istic precedence constraints expressed as a directed acyclic graph. Their method is 
based on the concept of hierarchical decomposition [Cou77]. At the higher level, a 
Markov chain corresponding to the transitions among system states is generated. At 
the lower level, the transition rates among the states are computed using a queueing 
network solver. Their method is a significant improvement over the exact method 
using Markov chain alone. However, the number of states in their method is still too 
high for large systems. For a system with N tasks, in the worst case, there may be as 
many as 2^ states in their higher level Markov chain. 

Mohan in his thesis [Moh84] considered similar task structures as [TB83]. He also 
made use of queueing network models to find throughput of system states in the lower 
level. In the higher level, he used simulation to determine the mean completion time 
for the system by tracing different execution paths. For a system with N tasks, there 
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may also be as many as 2 N execution paths. Hence, it requires at least the same 
complexity as the method in [TB83]. 

The task system model as used in the method described later is a subset of that 
considered in [TB83] : it has to be series-parallel-reducible. This method is also based 
on the concept of hierarchical decomposition, but instead of forming a Markov chain 
and computing state probabilities, an operational approach is used to determine per- 
formance measures directly from measurable quantities. This approach reduces the 
complexity of the method to be polynomial. 

In section 2 of this report, the task system and computer system models used in 
this procedure are described. The operational approach is presented in section 3. The 
three key steps in the procedure: estimation of contention, estimation of task execution 
time, and estimation of system completion time, are discussed in sections 3.1, 3.2, and 
3.3, respectively. This procedure has been validated through simulation. Three test 
cases are presented and the numerical results are compared to those of simulations in 
section 4. 


2 Task System and Computer System Models 

The task system model used in this report follows closely with the one defined in [TB83] 
except with one additional constraint: the precedence graph is series-parallel-reducible 
(see Fig. 1). A task system is specified by a 3-tuple (T, [-<], [£>„*]) as follows: 

1. T = (7i, Tj, . . . , Tff) is a set of tasks to be executed on the computer system. 
Except for queueing effect, the tasks execute independent of each other. 

2. [-<] is a partial order defined on T specifying deterministic precedence constraints. 
Ti -< Tj means that T,- must be completed before Tj can begin. Only series- 
parallel-reducible directed acyclic graphs are considered. 

3. [D n k] is an N x K matrix, such that D n k is the service demand (expected total 
loadings) of task n on resource k. 

The computer system is modeled as a central server queueing network (see Fig. 2) . 
Each resource in the system is modeled as a service center. During the course of 
execution of a task system, the computer system processes different combinations of 
tasks according to the precedence constraints until all tasks are completed. Each 
task combination in progress at the computer system can be represented as a closed 
queueing network with multiple job types. Tasks are assumed to execute as soon as 
their precedence constraints are satisfied. Other task scheduling disciplines are also 
possible but are not considered in this report. To simplify discussion, the queueing 
network model is assumed to be separable and has a product-form solution [LZGS84]. 
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Figure 1: A Series-Parallel-Reducible Task System 



Figure 2: A Concurrent Computer System 

J 
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3 An Operational Approach 

One of the most difficult problems in predicting performance of a concurrent task 
system is that service demands system resources are probabilistic. This results in 
probabilistic task execution times and hence different possible execution paths for the 
same task system. Each execution path has its own path probability and a corre- 
sponding system completion time. To determine exactly the mean system completion 
time, all these possible execution paths must be taken into account. Earlier efforts 
have tried to use either mathematical analysis [TB83] or simulation [Moh84] to find 
all possible execution paths and their path probabilities. Although accurate, those 
methods are infeasible for large task systems due to their exponential complexity. 

The main difference of the operational approach described here compared with 
earlier efforts is in estimating task mean execution times and system mean comple- 
tion time directly without tracing all possible execution paths. In [DB78], Denning 
and Buzen have used an operational approach for the analysis of queueing network. 
Instead of using stochastic models to compute performance measures, algebraic rela- 
tionships among measurable quantities (such as throughput and response time) are 
derived. Using these relationships, performance measures of queueing network models 
can be computed directly without resorting back to stochastic models. Likewise, in 
this procedure, some algebraic relationships are derived among measurable quantities, 
such as task mean initiation times, task mean execution times, system mean comple- 
tion time, and task service demands. Using this set of relationships, the performance 
of the system can be estimated directly without tracing all execution paths. 

This approach is based on the concept of hierarchical decomposition [Cou77], i.e., 
the system is assumed to reach equilibrium between task initiation and completion 
instants. As mentioned in the last section, each task combination in progress at the 
computer system is represented as a cloesed queueing network with multiple job types, 
resource utilizations and mean queue lengths can be assumed to reach their steady state 
values during this interval because of the decomposition approximation. 

The execution time of a task consists of two components: its actual service time by 
the resources in the computer system, and the waiting time spent at the queues. The 
former is the service demand and is given in [D n *]. The later depends on the amount 
of contention experienced in the system, which in turn depends on the loadings of the 
system during the task’s execution interval. By comparing with mean initiation times 
and mean completion times of other tasks in the system, the amount of contention 
experienced by a task during its execution interval can be estimated. With the amount 
of contention, the mean execution time of a task can be estimated. Knowing all task 
mean execution times, the mean initiation time for each task and the mean completion 
time for the task system can be estimated by reducing the series-parallel precedence 
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graph. With the new set of mean initiation times and mean execution times, a better 
estimate of the amount of contention can be obtained. The iterative process continues 
until the estimate of the mean completion time converges. 

Following is the outline of the iterative procedure: 

1. Initiate the system as contention free, i.e., task execution times equal to service 
demands. 

2. Estimate amount of contention experienced by each task during its execution 
interval. 

3. Estimate mean execution times for all tasks using the contention found in the 
previous step. 

4. Estimate mean initiation times for all tasks and mean completion time for the 
task system by reducing the series-parallel precedence graph. 

5. Repeat steps 2, 3, and 4 until successive estimates of mean completion time 
converge to within some tolerance (say 0.1%). 

Steps 2, 3, and 4 are the key components of this iterative procedure. They will be 
discussed in more detail in the next three sections. The complexity of each iteration 
is 0(N 3 K) where N is the number of tasks and K is the number of resources in the 
system. 


3.1 Estimation of Contention 

The amount of contention experienced by a task during its execution interval depends 
on the number of tasks competing with it for system resources. If there is no other task 
executing concurrently with it, it can obtain service immediately at every resource, 
and its execution time is just equal to its service demand. On the other hand, if there 
are a lot of concurrent tasks, each task spends more time waiting at the queues for 
service. The amount of contention can be estimated in terms of the arrival instant 
queue length, A^, which is the mean queue length (including the one in service) as 
seen by task i when it first arrives at resource k. If Q nk is the steady state queue 
length of task n at resource k, and W in is the fraction of task t’s execution interval 
such that tasks i and n overlap with each other, then Ait can be computed as follows: 

N 

A ik = £ QntWin 

n=l,n^t 


(i) 
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Q nk is equal to the fraction of E nk , the time task n spent in resource k, during E n , n's 
execution interval, 

QnJ. = ^ (2) 

-C'n 

Let be the probability that tasks i and n overlap, and d, n be the mean duration 
of the overlapping interval if tasks i and n overlap, then 


W in = 


Pin d, 

Ei 


( 3 ) 


Substituting equations 2 and 3 into equation 1, we get 


. 1 E n k Pin di n 

a * = TF. 2^ 


E* n=l,n#i 


E n 


( 4 ) 


Consider task i with mean initiation time, and mean completion time C,-, and 
task n with mean initiation time, I n , and mean completion time C n . Tasks i and n 
will overlap with each other unless i initiates after n has completed, or vice versa. 
Therefore, 

Pin = 1 - Pr(C, < In) - Pr(c n < /,) (5) 

If A and B are two independent non-negative continuous random variables, then 

Pr(.A < B) = f Pt(B > x)f A (x)dx 

J o 

( 6 ) 

[1 - F B {x)]f A (x)dx 

where f A (x) is the probability density function of A and F B {x) is the probability 
distribution function of B. 

Equation 6 involves both the distribution and density functions of A and B, and 
is very expensive to compute. It is computationally more economical to consider only 
the means and variances of the initiation and completion times. Assume that A and B 
are both Erlang distributed with parameters (A^, r A ) and (A#, r B ), respectively. Sub- 
stituting these parameters into equation 6, and after simplification (see Appendix A), 



Pr(A < B) = 


( V A ( A B \ h (r A + *-l)l 

VA^ + Ab/ k - 0 \Ax + Ab/ (r A — 1)! A:! 


( 7 ) 


Notice that the mean initiation and completion times of tasks i and n in equation 5 
are not independent of each other. Although tasks execute independently of each 
other, they may still have a common ancestor chain in the precedence graph. If t is 
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the duration of their common ancestor chain, then it has to be subtracted from their 
mean initiation and completion times before equation 7 can be used to compute the 
probabilities. 

Task execution times are assumed to be exponentially distributed. This is asymp- 
totically true if the task cycles back through the queueing network with probability p 
and exits with probability (1 — p), and if p is close to 1 [LS79]. Assume that A,- (= J-) 
and A n (= J-) be parameters of tasks i and n’s execution time distributions respec- 
tively. When tasks i and n overlap, the overlapping region will end when either task 
terminates. Since i and n are two independent tasks with exponentially distributed ex- 
ecution times, using the memoryless property of the exponential distribution function, 
the duration of the overlapping region is also exponentially distributed with parameter 
equals to A,- + A n . Therefore, the mean duration of the overlapping region is, 


d ‘" “ A, + A n 


( 8 ) 


In summary, using equations 4, 5, and 8, we can determine the arrival instant 
queue length for each task at each resource in the system, and hence the amount of 
contention experienced by each task. 


3.2 Estimation of Task Mean Execution Time 

The mean execution time of task i at resource k can be estimated using the following 
equation [RL80], 

Eik = Dik{ 1 + Aik) ( 9 ) 

Dik is the service demand of task i on resource k and is given as an input parameter 
of the task system. A,*, the arrival instant queue length, is computed as described in 
the last section. The mean execution time for task i is then, 

K 

Ei = (10) 

Jfc=l 


3.3 Estimation of System Mean Completion Time 

After finding the mean execution time for each task, the next step is to make use 
of the series-parallel-reducible precedence graph to determine mean initiation times 
and mean completion time for the system. Because of the special structure of the 
precedence graph, these values can be estimated by reducing the series-parallel graph, 
i.e., finding the equivalent execution time for tasks in series and in parallel [Kle85] . 
The whole task system can be reduced to a single equivalent execution time, which is 
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the mean completion time for the system. Mean initiation time for a task is actually 
the mean completion time for the task subsystem consisting of all of its ancestors. 

Execution time for a task is not a fixed number, but a random variable with some 
distribution. As explained in section 3.1, task execution time is assumed to have an 
exponential distribution. Task execution times are also assumed to be independent of 
each other except for queueing effect. 

Assume that tasks i and n have probability distribution functions and F n , and 
density functions /,• and f n , respectively. If t and j are in series, then the equivalent 
probability density function of their series combination will be the convolution of the 
two density functions, 

f eq = fi * fi (11) 

If i and j are in parallel, then the equivalent probability distribution function of 
their parallel combination will be the product of the two distribution functions, 

. - F. q = F Fi (12) 

However this approach is very expensive computationally, since convolution and 
integration have to be performed numerically. A more economical way to find the series 
and parallel equivalences is to consider only the means and variances of distributions. 

Let Ei and V,- be the mean and variance of the execution time of task subsystem 
t, respectively. If task subsystems i and j are in series, then using the Central Limit 
Theorem, the equivalent mean and variance of the series combination are: 

E eq = Ei + Ej 

and 

v e , = + Vi ( 14 ) 

For task subsystems in parallel, the determination of the equivalent mean and 
variance is a little bit more complicated than the serial case. As in section 3.1, assume 
that the execution time of task subsystem i is Erlang distributed with parameters (A,-, 
rj), and for j, (A r ; ). These parameters are substituted into equation 12 to solve 
for F eq , which can then be used to solve for the equivalent mean and variance (see 
Appendix B). After simplification, the equivalent mean and variance can be shown to 
be: 

* (r, + fc)! 

(r. - 1)! k\ 

t ( 15 ) 

% "f - 1 ( h V h + *) ! 

(A i + A i )^* i t' 0 U + A;y (ry - 1)! *! 


E eq = Ei + Ej- 


A, r< 


Ay 

(A, + Ay ) r<+1 ^oU + A, 


E 



9 


V„ = E; + E) + V, + Vj - £=, 

A? 'f-‘ ( A,- \* (r. + fc + l)! 

(A. + A,)"+» U + A, / (r. - 1)! A! (16) 

A? 'P? / A, y (r, + * + !)! 

(A i + A y )'i« 4 t' 0 U+A J j (r,- 1)1*1 

Equations 13 to 16 are used to reduce the precedence graph to obtain mean initia- 
tion times and mean completion time for the task system. This set of newly computed 
values can then be used to estimate the contention in the system. 

The mean completion time, C, can also be used to determine the utilizations and 
mean queue lengths of the resources in the system. For a system with N tasks, the 
utilization of resource k is 

U„=^T. D <t ( 17 ) 

° .=1 

The mean queue length of resource k is 

Qk = (is) 

U t=l 

4 Test Cases and Validation 

In this section, three test cases are used to evaluate the accuracy of the above procedure 
by comparing the numerical results to those of simulations. In the above method, since 
the queueing network is separable and has a product-form solution, the specification 
of the task system model requires only the expected total loading (the product of the 
mean number of visits to the resource and the mean service time of the resource) of 
each task on each resource. However, for simulation purpose, both the expected total 
loading and the mean service time of the resource have to be specified. The simulator 
was written using CSIM, a C-based process oriented simulation language developed 
by Herb Schwetman at MCC [Sch86]. 

The first test case is a 6-task system running on a uniprocessor system. This 
is the same example used in [TB83] and is chosen for comparison purpose. Both 
the second and the third test cases use a shared memory multiprocessor model in 
which there are 4 processors and 4 memory units connected by some interconnection 
network. The second test case has a master-slave task structure that corresponds to 
computations written in concurrent programming languages with constructs like fork 
and join. The third test case has a partitioning task structure which is common for 
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Figure 4: Uniprocessor System 

divide-and-conquer algorithms. All simulations for these three test cases were run for 
5000 replications. 

4.1 6-Task System 

The first test case is a 6-task system (Fig. 3) as used in [TB83]. The uniprocessor 
model (Fig. 4) used consists of one CPU (service time 0.020) and two identical disks 
(service time 0.040). The service demands of the 6-task system on the uniprocessor 
is shown in Table 1. The operational method takes only 3 iterations to converge to 
the final estimation. The results are shown in Table 2. 

4.2 Master-Slave System 

The second test case has a master-slave task structure (Fig. 5) that is common for 
computations written in concurrent languages with constructs like fork and join. The 
master task spawns off a number of slave tasks and wait for their completions before 
proceeding. Two cycles of this synchronization pattern are shown in this test case. The 
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Table 1: Service Demands of 6-Task System on Uniprocessor 


Task 

CPU 

DISKl 

DISK2 

1 

0.420 

0.400 

0.400 

2 

0.420 

0.400 

0.400 

3 

0.620 

0.600 

0.600 

4 

0.620 

0.600 

0.600 

5 

0.420 

0.400 

0.400 

6 

0.420 

0.400 

0.400 


Table 2: Results of 6-Task System 



Operational 

Simulation 

Error 

c 

6.235 

6.140 

1.55% 

Ei 

1.825 

1.795 

1.67% 

Ei 

1.825 

1.876 

-2.72% 

E 3 

2.233 

2.222 

0.50% 

E 4 

2.470 

2.469 

0.04% 

E 5 

1.716 

1.705 

0.65% 

e 6 

1.716 

1.720 

-0.23% 

h, 1 2, I* 

0.000 

0.000 

0.00% 

h 

2.738 

2.643 

3.59% 

Is, h 

2.470 

2.469 

0.04% 
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Figure 5: Master-Slave System 

computer system model used is a shared memory multiprocessor system (Fig 6). It has 
4 processors (service time 0.020) and 4 memory units (service time 0.020) connected 
by 2 interconnection networks (modeled as delay centers with delay time 0.001). The 
service demands are shown in Table 3. Three iterations are required by the operational 
method to converge to the final estimates. The results are shown in Table 4. 

4.3 Divide- And-Conquer System 

The third test case has a partitioning task structure (Fig. 7) that is common for 
computations using divide-and-conquer algorithms. The computer system model used 
is the same as the last test case as shown in Fig. 6. The service demands of this divide- 
and-conquer system on the multiprocessor is shown in Table 5. Only four iterations 
are required to converge to the final estimates. The results are shown in Table 6. 
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Figure 6: Shared Memory Multiprocessor System 


Table 3: Service Demands of Master-Slave System on Multiprocessor 


Task 

PI 

P2 

P3 

P4 

INI 

IN2 

Ml 

M2 

M3 

M4 

Masterl 

0.520 




0.025 

0.025 

0.125 

0.125 

0.125 

0.125 

Sla 

0.620 




0.030 

0.030 

0.150 

0.150 

0.150 

0.150 

Sib 

0.520 




0.025 

0.025 

0.125 

0.125 

0.125 

0.125 

Sic 


0.620 



0.030 

0.030 

0.150 

0.150 

0.150 

0.150 

Sid 


0.520 



0.025 

0.025 

0.125 

0.125 

0.125 

0.125 

Sle 



0.620 


0.030 

0.030 

0.150 

0.150 

0.150 

0.150 

Slf 



0.520 


0.025 

0.025 

0.125 

0.125 

0.125 

0.125 

Slg 




0.620 

0.030 

0.030 

0.150 

0.150 

0.150 

0.150 

Slh 




0.520 

0.025 

0.025 

0.125 

0.125 

0.125 

0.125 

Master2 

0.520 




0.025 

0.025 

0.125 

0.125 

0.125 

0.125 

S2a 

0.620 




0.030 

0.030 

0.150 

0.150 

0.150 

0.150 

S2b 

0.620 




0.030 

0.030 

0.150 

0.150 

0.150 

0.150 

S2c 


0.620 



0.030 

0.030 

0.150 

0.150 

0.150 

0.150 

S2d 


0.620 



0.030 

0.030 

0.150 

0.150 

0.150 

0.150 

S2e 



0.520 


0.025 

0.025 

0.125 

0.125 

0.125 

0.125 

S2f 



0.520 


0.025 

0.025 

0.125 

0.125 

0.125 

0.125 

S2g 




0.520 

0.025 

0.025 

0.125 

0.125 

0.125 

0.125 

S2h 




0.520 

0.025 

0.025 

0.125 

0.125 

0.125 

0.125 

Master3 

0.520 




0.025 

0.025 

0.125 

0.125 

0.125 

0.125 
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Table 4: Results of Master-Slave System 



Operational 

Simulation 

Error 

c 

10.452 

10.732 

-2.61% 

/'Masterl 

1.070 

1.059 

1.04% 

^Sla 

1.667 

1.642 

1.52% 

Esib 

1.430 

1.390 

2.88% 

Esic 

1.667 

1.646 

1.28% 

Esid 

1.430 

1.401 

2.07% 

Esie 

1.667 

1.643 

1.46% 

Esu 

1.430 

1.427 

0.21% 

Esis 

1.667 

1.638 

1.77% 

Esih 

1.430 

1.425 

0.35% 

E Master2 

1.070 

1.068 

0.19% 

Es2& 

1.677 

1.659 

1.08% 

Es2b 

1.677 

1.642 

2.13% 

Es2c 

1.677 

1.632 

2.76% 

Es2& 

1.677 

1.666 

0.66% 

Es2e 

1.421 

1.432 

-0.77% 

Es2i 

1.421 

1.398 

1.65% 

Es2g 

1.421 

1.387 

2.45% 

Es2h 

1.421 

1.430 

-0.63% 

■^Master3 

1.070 

1.075 

-0.47% 

I Maatcrl 

0.000 

0.000 

0.00% 

/si 

1.070 

1.059 

1.04% 

/ Master2 

4.712 

4.814 

-2.12% 

/S2 

5.782 

5.882 

-1.70% 

/ Master3 

9.382 

9.657 

-2.85% 
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Figure 7: Divide- And-Conquer System 
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Table 5: Service Demands of Divide-And-Conquer System on Multiprocessor 


Task 

PI 

P2 

P3 

P4 

INI 

IN2 

Ml 

M2 

M3 

M4 

Start 

0.520 




0.025 

0.025 

0.125 

0.125 

0.125 

0.125 

Dla 

0.620 




0.030 

0.030 

0.150 

0.150 

0.150 

0.150 

Dlb 



0.620 


0.030 

0.030 

0.150 

0.150 

0.150 

0.150 

D2a 

0.620 




0.030 

0.030 

0.150 

0.150 

0.150 

0.150 

D2b 


0.620 



0.030 

0.030 

0.150 

0.150 

0.150 

0.150 

D2c 



0.620 


0.030 

0.030 

0.150 

0.150 

0.150 

0.150 

D2d 




0.620 

0.030 

0.030 

0.150 

0.150 

0.150 

0.150 

D3a 

0.620 




0.030 

0.030 

0.150 

0.150 

0.150 

0.150 

D3b 

0.620 




0.030 

0.030 

0.150 

0.150 

0.150 

0.150 

D3c 


0.620 



0.030 

0.030 

0.150 

0.150 

0.150 

0.150 

D3d 


0.620 



0.030 

0.030 

0.150 

0.150 

0.150 

0.150 

D3e 



0.620 


0.030 

0.030 

0.150 

0.150 

0.150 

0.150 

D3f 



0.620 


0.030 

0.030 

0.150 

0.150 

0.150 

0.150 

D3g 




0.620 

0.030 

0.030 

0.150 

0.150 

0.150 

0.150 

D3h 




0.620 

0.030 

0.030 

0.150 

0.150 

0.150 

0.150 

C2a 

0.420 




0.020 

0.020 

0.100 

0.100 

0.100 

0.100 

C2b 


0.420 



0.020 

0.020 

0.100 

0.100 

0.100 

0.100 

C2c 



0.420 


0.020 

0.020 

0.100 

0.100 

0.100 

0.100 

C2d 




0.420 

0.020 

0.020 

0.100 

0.100 

0.100 

0.100 

Cla 

0.420 




0.020 

0.020 

0.100 

0.100 

0.100 

0.100 

Clb 



0.420 


0.020 

0.020 

0.100 

0.100 

0.100 

0.100 

End 

0.520 




0.025 

0.025 

0.125 

0.125 

0.125 

0.125 


5 Summary 

This report has presented an efficient procedure for predicting performance of series- 
parallel-reducible task system. The complexity of each iteration of the method is 
0{N 3 K), where N is the number of tasks and K is the number of resources in the 
system. This complexity is a very significant reduction over earlier efforts which either 
dealt with very simple task structures, or had methods with exponential complexity. 
By using an operational approach, the performance measures of the system can be 
estimated directly from measurable quantities without tracing all possible execution 
paths. The procedure is very accurate as demonstrated by the three test cases pre- 
sented. In fact, the maximum observed error in the estimates from the procedure 
are within 4% of simulation results in all three cases. The procedure also has a high 
convergent rate: for the test cases presented, convergence can be achieved in less than 
5 iterations. 


Table 6: Results of Divide-And-Conquer System 



Operational 

Simulation 

Error 

c 

12.002 

11.994 

0.07% 

-^Start 

1.070 

1.083 

-1.20% 

^Dla 

1.407 

1.401 

0.43% 

•^Dlb 

1.407 

1.363 

3.23% 

^D2a 

1.504 

1.503 

0.07% 

■^D2b 

1.504 

1.494 

0.67% 

Ed2c 

1.504 

1.479 

1.69% 

E D2d 

1.504 

1.482 

1.48% 

ED3& 

1.653 

1.665 

-0.72% 

ED3b 

1.653 

1.640 

0.79% 

Ed3c 

1.653 

1.625 

1.72% 

Ed 3d 

1.653 

1.640 

0.79% 

ED3e 

1.653 

1.659 

-0.36% 

ED3i 

1.653 

1.653 

0.00% 

Ed3% 

1.653 

1.642 

0.67% 

ED3h 

1.653 

1.660 

-0.42% 

Ec2a. 

0.972 

0.940 

3.40% 

Ec2b 

0.972 

0.955 

1.78% 

Eq2c 

0.972 

0.944 

2.97% 

Ec2d 

0.972 

0.936 

3.85% 

Ecia, 

0.909 

0.891 

2.02% 

Ecib 

0.909 

0.900 

1.00% 

EEnd 

1.070 

1.069 

0.09% 

/start 

0.000 

0.000 

0.00% 

^Dla» -^Dlb 

1.070 

1.083 

-1.20% 

^D2a, ^D2b 

2.477 

2.484 

-0.28% 

/d2cj ^D2d 

2.477 

2.446 

1.27% 

^D3aj ^D3b 

3.980 

3.988 

-0.20% 

/d3ci ^D3d 

3.980 

3.978 

0.05% 

^D3e» ^D3f 

3.980 

3.925 

1.40% 

/D3&, ^D3h 

3.980 

3.928 

1.32% 

/c2a 

6.460 

6.373 

1.37% 

/c2b 

6.460 

6.330 

2.05% 

Ic2c 

6.460 

6.312 

2.34% 

Ic2d 

6.460 

6.300 

2.54% 

^Cla 

8.566 

8.550 

0.19% 

/cib 

8.566 

8.522 

0.52% 

/End 

10.932 

10.925 

0.06% 
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A DERIVATION OF EQUATION 7 


Appendices 


A Derivation of Equation 7 


From equation 6, 

Ft(A <B)= Hi - F B (x)]f A (x)dx 
•'o 

Assume that A and B are two independent Erlang distributed random variables with 
parameters (A a , r A ) and (A^, r B ) respectively, then, 

A a a x^- 1 e~ x * 1 
( r A-l)! 



and 


F a (z) = 1 - e‘ * 


k=0 


kl 


Substituting the above two equations into equation 6, 


r oo [ r B i 

Pi(A < B) = / E 
0 L*=o 


V(Ab *)\_ a 


B * 


k\ 


A a a z rA_1 e~* A * ' 
{r A ~ 1)1 


dx 


oo x k B \ r j x r * +k ~ l 


r oo [ r B- ] 

= L E 


. k=0 


(r A -iy.k\ 


o~(^A + ^b)x 


dx 


^ fee \k B \ r * x r A + k-l 


rs-1 r 

- e/ 


k^o J ° ( r A - !) ! k - 


e~( Xx+x B) x dx 


rB h /” 

jfc. ( r A - !) ! 


f°° x r * +fc_1 e ~^ A+XB ^ x dx 

J n 


*b*a ( r A + fc — 1)! 

t=o ( r A — !)• (-^a + A fl ) r ^ +A: 

f a a y* ^ / a b y (r A + fc- 1 )! 

\Aa + Ab/ a=0 VA A + A B y (r A — l)! A:! 
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B Derivation of Equations 15 and 16 

From equation 12, the equivalent probability distribution function of the parallel com- 
bination is, 

F eq = Fi Fj 

Therefore, the equivalent probability density function is, 


L q = fi F j + f j Fi 


Assume that t and j are two independent Erlang distributed random variables with 
parameters (A,-, r,) and (Ay, ry) respectively. Substituting these parameters into the 
above equation, 


feq 


x r,— 1 e- A<2 

V - !)•' 


^ k! 


k=0 


+ 


A? x r i~ 1 e x i* 

h - 1)! 


i _ ■ 


k=0 


k\ 


r j~ l yi \k T ri-l+k 

= /,- + /,- - V * ti -(A. + Ay) a 

/, + /j to V-l )lk\ 

r« — l \ r J \k -ry-l+Jk 
_ y- -(A,+A,)x 

to V~l)\k\ 


— fi + fj ~ S\ — S 2 

The equivalent mean of the parallel combination is, 

E e q I xf e qdx 
J 0 

= J Q x(fi + fj - Si - S 2 )dx 

/ oo Too Coo Coo 

xfidx + / xfjdx — / xSidx — / xS 2 dx 

o ■'o J o J o 

Coo Coo 

= Ei + Ej — / xSidx — / xS 2 dx 

J n •'n 


20 


B DERIVATION OF EQUATIONS 15 AND 16 


Consider the third term of the above equation, 


/ oo Coo r J -1 AT' r r,--l+* 

xS l dx = / x^2 - 7 — J e~ (A,+Ay) *dx 


r i~ l \k 

S *> j ° 


= A? E Trrwrr/”^ 


r,— 1 




(r< + /:)! 


A *' S ( r ‘ - !) ! *1 ( A < + A .) r ' + ‘ +I 


V "f‘ / A,- \ * (r, + *)! 

(A, + A,)'.« £- 0 U + A J (n- 1)1*1 


Similarly, 


/: 


xSndx = 


\ r l r ,— 1 / \ 

/ 'm 

I7w+i ^ 


( r j + A;)! 


(A,- + Aj) r i +1 V A .' + A , ) (ry - 1)! *! 
Therefore, the equivalent mean of the parallel combination is 


E eq = E { + Ej - 


\Ti rj-l 

A i 

\r ,+ 1 


Ay V (* + *) ! 


A 7 


(A, + Ay) r, ' +1 ^0 VAi + Ay7 (r, - l)! kl 

r« — 1 /x \ * 


E 


K V fo + *)' 


(*, + *,)•»+* + (r, — 1 )!*! 


The equivalent variance of the parallel combination is 

v„ = /” x'f^dx-E 1 ,, 

f oo f oo f oo f oo 

= / x 2 fidx + / x 2 fjdx — / x 2 S l dx — / x 2 S 2 dx — i? 2 
•'o "'o J o J o q 

= E 2 + V< + £ 2 + Vy - E] q - / o °° x 2 5 t dx - / o °° x 2 S 2 dx 


Consider the second last term in the above equation, 




A,-*' A* x Ti+k ~ l 
(r, - 1)! k\ 


e ( A<+A, ') z dx 


= a; 


rj-l \k 

jfc, ( r « - 1) ! 


[°° x ri+k+1 e~^ i+x ^ x dx 

J n 


, r. y- 1 ( r « + + 1)! 

‘ + Ay)"+*+* 


K' ■y' ( Ay ^ ( r « + A: + 1)! 
(A,- + Aj) r * +2 £jU + *J (*-!)«*» 


Similarly, 



Ay* y ^ 1 / A,- \ (ry + A: + l)! 

(A, + Ay)^ 2 fcU + Aj fa-1)!*! 


Therefore, the equivalent variance of the parallel combination is 

v eq = Ef + El + Vi + Vj-E 2 " 


A? r f>V Ay \ fc (r t + fc + l)! 

(A,- + Ay) r ‘+* ^ \A, + Xj (r,- - 1)! A:! 

A?' r ^( A, y (ry + fc + l)i 

(A, + Ay) r ^ £r 0 U + AyJ (ry - 1)! A;! 
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